Refactor jsx mode in parser #7751

nojaf · 2025-08-02T10:15:49Z

I was experimenting with the parser related to JSX and noticed that we have a somewhat convoluted mechanism for handling /> and - in identifier names.

First, I created a token dump tool in res_parser, which was previously missing.

Currently, the parser employs a sequence of Scanner.set_jsx_mode p.Parser.scanner; and Scanner.pop_mode p.scanner Jsx; while processing elements to distinguish between parser JSX and non-JSX. However, this is mainly used in the following cases:

Allowing a - inside an identifier. This logic belongs in the parser, but it currently resides in the scanner, which feels inappropriate.
Combining a < + / into a </ token. I would prefer using a lookahead for when a < is encountered. This would clarify that it's specific to JSX parsing. Even though LessThanSlash exists, there is still a separate LessThan + Slash check, which makes the code a bit messy.

This is an effort to streamline the JSX mode.

PS: to run the local analysis tests, I had to revert to legacy clean. This is for the better until we figure out #7707

nojaf · 2025-08-02T10:17:59Z

compiler/syntax/src/res_core.ml

    | LessThan ->
      (* Imagine: <div> <Navbar /> <
       * is `<` the start of a jsx-child? <div …
       * or is it the start of a closing tag?  </div>
       * reconsiderLessThan peeks at the next token and
       * determines the correct token to disambiguate *)
-      let token = Scanner.reconsider_less_than p.scanner in


This is what bother me a bit, there is LessThanSlash above, yet we still need to do the reconsider_less_than call.

nojaf · 2025-08-02T10:19:31Z

compiler/syntax/src/res_core.ml

        let attr_expr = parse_primary_expr ~operand:(parse_atomic_expr p) p in
        Some (Parsetree.JSXPropValue ({txt = name; loc}, optional, attr_expr))
      | _ -> Some (Parsetree.JSXPropPunning (false, {txt = name; loc})))
  (* {...props} *)
  | Lbrace -> (
-    Scanner.pop_mode p.scanner Jsx;


This is rather confusing when you are in a nested jsx scenario:

<div> <p> {foo} </p> </div>

Popping Jsx from p requires the pop of div also to happen to get out of Jsx mode.

nojaf · 2025-08-02T11:55:12Z

tests/analysis_tests/tests/src/expected/CompletionJsx.res.txt

-posCursor:[30:12] posNoWhite:[30:11] Found expr:[30:9->32:10]
-JSX <di:[30:10->30:12] div[32:6->32:9]=...[32:6->32:9]> _children:None
+posCursor:[30:12] posNoWhite:[30:11] Found expr:[30:9->30:12]
+JSX <di:[30:10->30:12] > _children:None


This changes is because there is slightly different AST for:

<div> <di </div>

It used to be

[ structure_item (A.res[1,0+0]..[3,12+6]) Pstr_eval expression (A.res[1,0+0]..[3,12+6]) Pexp_jsx_container_element "div" (A.res[1,0+1]..[1,0+4]) jsx_props = [] > [1,0+4] jsx_children = [ expression (A.res[2,6+2]..[3,12+6]) Pexp_jsx_container_element "di" (A.res[2,6+3]..[2,6+5]) jsx_props = [ div ] > [3,12+5] jsx_children = [] ] ]

and now is

[ structure_item (A.res[1,0+0]..[3,12+6]) Pstr_eval expression (A.res[1,0+0]..[3,12+6]) Pexp_jsx_container_element "div" (A.res[1,0+1]..[1,0+4]) jsx_props = [] > [1,0+4] jsx_children = [ expression (A.res[2,6+2]..[2,6+5]) Pexp_jsx_unary_element "di" (A.res[2,6+3]..[2,6+5]) jsx_props = [] ] ]

I think this is more correct. The unary (new) versus container (old) doesn't matter that much, neither can be determined for <di.
However, the container had a weird prop div, which it no longer has.

pkg-pr-new · 2025-08-02T11:58:18Z

Open in StackBlitz

rescript

npm i https://pkg.pr.new/rescript-lang/rescript@7751

@rescript/darwin-arm64

npm i https://pkg.pr.new/rescript-lang/rescript/@rescript/darwin-arm64@7751

@rescript/darwin-x64

npm i https://pkg.pr.new/rescript-lang/rescript/@rescript/darwin-x64@7751

@rescript/linux-arm64

npm i https://pkg.pr.new/rescript-lang/rescript/@rescript/linux-arm64@7751

@rescript/linux-x64

npm i https://pkg.pr.new/rescript-lang/rescript/@rescript/linux-x64@7751

@rescript/win32-x64

npm i https://pkg.pr.new/rescript-lang/rescript/@rescript/win32-x64@7751

commit: 452301a

nojaf · 2025-08-04T14:03:19Z

Hi @cristianoc, sorry for my eagerness, could you take a look at these changes?

cristianoc · 2025-08-04T18:03:11Z

Are there differences in white space behavior? Are these intended? (For composite tokens)? Or possible ambiguities when single characters are tokenised in isolation.
Traveling so could not take a look in detail, but these are the things that come to mind.

nojaf · 2025-08-04T18:12:29Z

Or possible ambiguities when single characters are tokenised in isolation.

No, actually not, there is no other way the language can encounter </ besides JSX.
So that made me wonder if we needed the jsx mode in the first place.

Safe travels, will ask someone else for a review.
(Don't be shy to take a look at this once you are back 😇)

@zth , @shulhi , @aspeddro any volunteers?

Copilot

Pull Request Overview

This PR refactors the JSX mode handling in the ReScript parser by removing the convoluted JSX mode mechanism from the scanner and moving JSX-specific logic to the parser. The refactoring streamlines JSX parsing by eliminating the need for Scanner.set_jsx_mode and Scanner.pop_mode calls throughout the codebase, and introduces lookahead functionality for better JSX token handling.

Key changes:

Removes JSX mode from the scanner and replaces it with parser-level JSX identifier handling
Introduces lookahead functions (peekMinus, peekSlash) for better JSX token disambiguation
Adds a token debugger tool for development purposes

Reviewed Changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
`compiler/syntax/src/res_scanner.ml`	Removes JSX mode handling and adds lookahead functions for minus and slash characters
`compiler/syntax/src/res_core.ml`	Implements JSX identifier parsing in the parser with lookahead-based logic
`compiler/syntax/src/res_token.ml`	Removes LessThanSlash token type
`compiler/syntax/src/res_token_debugger.ml`	Adds new token debugging utility
`tests/syntax_tests/data/parsing/errors/expressions/expected/jsx.res.txt`	Updates expected error message
Various package.json files	Reverts to legacy clean command for analysis tests

compiler/syntax/src/res_core.ml

Copilot · 2025-08-05T13:10:05Z

compiler/syntax/src/res_core.ml

+    p.token <- token
+  | Uident txt when Scanner.peekMinus p.scanner ->
+    let buffer = Buffer.create (String.length txt) in
+    Buffer.add_string buffer txt;
+    Parser.next p;
+    let name = visit buffer |> Buffer.contents in
+    let token = Token.Uident name in
+    p.token <- token


[nitpick] Direct mutation of parser state (p.token <-) breaks encapsulation and makes the code harder to reason about. Consider using a proper parser method or returning the modified token instead of mutating parser state directly.

Suggested change

p.token <- token

| Uident txt when Scanner.peekMinus p.scanner ->

let buffer = Buffer.create (String.length txt) in

Buffer.add_string buffer txt;

Parser.next p;

let name = visit buffer |> Buffer.contents in

let token = Token.Uident name in

p.token <- token

set_token p token

| Uident txt when Scanner.peekMinus p.scanner ->

let buffer = Buffer.create (String.length txt) in

Buffer.add_string buffer txt;

Parser.next p;

let name = visit buffer |> Buffer.contents in

let token = Token.Uident name in

set_token p token

Copilot · 2025-08-05T13:10:05Z

compiler/syntax/src/res_core.ml

+    p.token <- token
+  | Uident txt when Scanner.peekMinus p.scanner ->
+    let buffer = Buffer.create (String.length txt) in
+    Buffer.add_string buffer txt;
+    Parser.next p;
+    let name = visit buffer |> Buffer.contents in
+    let token = Token.Uident name in
+    p.token <- token


[nitpick] Direct mutation of parser state (p.token <-) breaks encapsulation and makes the code harder to reason about. Consider using a proper parser method or returning the modified token instead of mutating parser state directly.

Suggested change

p.token <- token

| Uident txt when Scanner.peekMinus p.scanner ->

let buffer = Buffer.create (String.length txt) in

Buffer.add_string buffer txt;

Parser.next p;

let name = visit buffer |> Buffer.contents in

let token = Token.Uident name in

p.token <- token

set_token p token

| Uident txt when Scanner.peekMinus p.scanner ->

let buffer = Buffer.create (String.length txt) in

Buffer.add_string buffer txt;

Parser.next p;

let name = visit buffer |> Buffer.contents in

let token = Token.Uident name in

set_token p token

tests/syntax_tests/data/parsing/errors/expressions/expected/jsx.res.txt

cristianoc · 2025-08-06T11:43:52Z

Hitting AI with AI

https://chatgpt.com/share/68933fb4-cef4-8011-a04c-a30808a82b89

Jump to the end for the relevant summary

nojaf added 8 commits August 1, 2025 20:35

Add token dump printer

2699aa7

Remove ForwardSlash token

b60bcae

Document RESCRIPT_BSC_EXE for local usage

0e8b542

Process hypens in parse_module_long_ident for jsx

7d6a563

Update test snapshot

b0f4806

Add how to view tokens.

50d46ef

fmt

849d19f

Clean up

28cfceb

nojaf commented Aug 2, 2025

View reviewed changes

nojaf added 3 commits August 2, 2025 12:27

Correct fragment range

bd46a3e

Use legacy clean for analysis projects

8bb4e41

Update analysis snapshot

9d8641e

nojaf commented Aug 2, 2025

View reviewed changes

nojaf marked this pull request as ready for review August 2, 2025 12:00

nojaf requested a review from cristianoc August 2, 2025 12:00

nojaf changed the title ~~Add token dump printer~~ Refactor jsx mode in parser Aug 2, 2025

nojaf requested review from Copilot and removed request for cristianoc August 4, 2025 18:12

Copilot AI reviewed Aug 5, 2025

View reviewed changes

nojaf added 2 commits August 5, 2025 15:34

Copilot review suggestion

69054c5

Return original error directly

452301a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Refactor jsx mode in parser #7751

Refactor jsx mode in parser #7751

nojaf commented Aug 2, 2025 •

edited

Loading

Uh oh!

nojaf Aug 2, 2025

Uh oh!

nojaf Aug 2, 2025

Uh oh!

nojaf Aug 2, 2025

Uh oh!

pkg-pr-new bot commented Aug 2, 2025 •

edited

Loading

Uh oh!

nojaf commented Aug 4, 2025

Uh oh!

cristianoc commented Aug 4, 2025

Uh oh!

nojaf commented Aug 4, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Copilot AI Aug 5, 2025

Uh oh!

Copilot AI Aug 5, 2025

Uh oh!

Uh oh!

cristianoc commented Aug 6, 2025

Uh oh!

Uh oh!

Refactor jsx mode in parser #7751

Are you sure you want to change the base?

Refactor jsx mode in parser #7751

Conversation

nojaf commented Aug 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nojaf Aug 2, 2025

Choose a reason for hiding this comment

Uh oh!

nojaf Aug 2, 2025

Choose a reason for hiding this comment

Uh oh!

nojaf Aug 2, 2025

Choose a reason for hiding this comment

Uh oh!

pkg-pr-new bot commented Aug 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nojaf commented Aug 4, 2025

Uh oh!

cristianoc commented Aug 4, 2025

Uh oh!

nojaf commented Aug 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Copilot AI Aug 5, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Aug 5, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cristianoc commented Aug 6, 2025

Uh oh!

Uh oh!

nojaf commented Aug 2, 2025 •

edited

Loading

pkg-pr-new bot commented Aug 2, 2025 •

edited

Loading

nojaf commented Aug 4, 2025 •

edited

Loading